A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem

نویسندگان

  • Zhan Peng
  • Yuping Wang
چکیده

Searching for the Multiple Longest Common Subsequences (MLCS) of multiple sequences is a classical NP-hard problem, which has been used in many applications. One of the most effective exact approaches for the MLCS problem is based on dominant point graph, which is a kind of directed acyclic graph (DAG). However, the time and space efficiency of the leading dominant point graph based approaches is still unsatisfactory: constructing the dominated point graph used by these approaches requires a huge amount of time and space, which hinders the applications of these approaches to large-scale and long sequences. To address this issue, in this paper, we propose a new time and space efficient graph model called the Leveled-DAG for the MLCS problem. The Leveled-DAG can timely eliminate all the nodes in the graph that cannot contribute to the construction of MLCS during constructing. At any moment, only the current level and some previously generated nodes in the graph need to be kept in memory, which can greatly reduce the memory consumption. Also, the final graph contains only one node in which all of the wanted MLCS are saved, thus, no additional operations for searching the MLCS are needed. The experiments are conducted on real biological sequences with different numbers and lengths respectively, and the proposed algorithm is compared with three state-of-the-art algorithms. The experimental results show that the time and space needed for the Leveled-DAG approach are smaller than those for the compared algorithms especially on large-scale and long sequences.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fast Heuristic Search Algorithm for Finding the Longest Common Subsequence of Multiple Strings

Finding the longest common subsequence (LCS) of multiple strings is an NP-hard problem, with many applications in the areas of bioinformatics and computational genomics. Although significant efforts have been made to address the problem and its special cases, the increasing complexity and size of biological data require more efficient methods applicable to an arbitrary number of strings. In thi...

متن کامل

FACC: A Novel Finite Automaton Based on Cloud Computing for the Multiple Longest Common Subsequences Search

Searching for the multiple longest common subsequences MLCS has significant applications in the areas of bioinformatics, information processing, and data mining, and so forth, Although a few parallel MLCS algorithms have been proposed, the efficiency and effectiveness of the algorithms are not satisfactory with the increasing complexity and size of biologic data. To overcome the shortcomings of...

متن کامل

Efficient merged longest common subsequence algorithms for similar sequences

Given a pair of merging sequences A and B, and a target sequence T , the merged longest common subsequence (MLCS) problem is to find out a longest common subsequence (LCS) between sequences E(A,B) and T , where E(A,B) is obtained from merging two subsequences of A and B. In this paper, we propose an algorithm for solving the MLCS problem in O(L(r − L + 1)m) time, where r and L denote the length...

متن کامل

Development of Cache Oblivious Based Fast Multiple Longest Common Subsequence Technique(CMLCS) for Biological Sequences Prediction

A biological sequence is a single, continuous molecule of nucleic acid or protein. Classical methods for the Multiple Longest Common Subsequence problem (MLCS) problem are based on dynamic programming. The Multiple Longest Common Subsequence problem (MLCS) is used to find the longest subsequence shared between two or more strings. For over 30 years, significant efforts have been made to find ef...

متن کامل

Efficient Dominant Point Algorithms for the Multiple Longest Common Subsequence (MLCS) Problem

Finding the longest common subsequence of multiple strings is a classical computer science problem and has many applications in the areas of bioinformatics and computational genomics. In this paper, we present a new sequential algorithm for the general case of MLCS problem, and its parallel realization. The algorithm is based on the dominant point approach and employs a fast divide-and-conquer ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2017